The Doubly Correlated Nonparametric Topic Model

نویسندگان

  • Dae Il Kim
  • Erik B. Sudderth
چکیده

Topic models are learned via a statistical model of variation within document collections, but designed to extract meaningful semantic structure. Desirable traits include the ability to incorporate annotations or metadata associated with documents; the discovery of correlated patterns of topic usage; and the avoidance of parametric assumptions, such as manual specification of the number of topics. We propose a doubly correlated nonparametric topic (DCNT) model, the first model to simultaneously capture all three of these properties. The DCNT models metadata via a flexible, Gaussian regression on arbitrary input features; correlations via a scalable square-root covariance representation; and nonparametric selection from an unbounded series of potential topics via a stick-breaking construction. We validate the semantic structure and predictive performance of the DCNT using a corpus of NIPS documents annotated by various metadata.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Nonparametric Regression for Longitudinal Data

In many area of medical research, a relation analysis between one response variable and some explanatory variables is desirable. Regression is the most common tool in this situation. If we have some assumptions for such normality for response variable, we could use it. In this paper we propose a nonparametric regression that does not have normality assumption for response variable and we focus ...

متن کامل

The IBP Compound Dirichlet Process and its Application to Focused Topic Modeling

The hierarchical Dirichlet process (HDP) is a Bayesian nonparametric mixed membership model—each data point is modeled with a collection of components of different proportions. Though powerful, the HDP makes an assumption that the probability of a component being exhibited by a data point is positively correlated with its proportion within that data point. This might be an undesirable assumptio...

متن کامل

The finite sample performance of semi- and non-parametric estimators for treatment effects and policy evaluation

The Finite Sample Performance of Semiand Nonparametric Estimators for Treatment Effects and Policy Evaluation This paper investigates the finite sample performance of a comprehensive set of semiand nonparametric estimators for treatment and policy evaluation. In contrast to previous simulation studies which mostly considered semiparametric approaches relying on parametric propensity score estim...

متن کامل

Doubly-nonparametric generalized linear models

We extend nonparametric generalized linear models to allow both the mean curve and the response distribution to be nonparametric. The seemingly intractable task of working with two infinite-dimensional parameters is shown to be reducible to a finite optimization problem, which is easily implemented via existing algorithms. We demonstrate using various examples that the proposed approach can be ...

متن کامل

Nonparametric Bayes Pachinko Allocation

Recent advances in topic models have explored complicated structured distributions to represent topic correlation. For example, the pachinko allocation model (PAM) captures arbitrary, nested, and possibly sparse correlations between topics using a directed acyclic graph (DAG). While PAM provides more flexibility and greater expressive power than previous models like latent Dirichlet allocation ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011